Concept Mining and Inner Relationship Discovery from Text
نویسندگان
چکیده
From the cognitive point of view, knowing concepts is a fundamental ability when human being understands the world. Most concepts can be lexicalized via words in a natural language and are called Lexical Concepts. Currently, there is much interest in knowledge acquisition from text automatically and in which concept extraction, verification, and relationship discovery are the crucial parts (Cao et al., 2002). There are a large range of other applications which can also be benefit from concept acquisition including information retrieval, text classification, and Web searching, etc. (Ramirez & Mattmann, 2004; Zhang et al., 2004; Acquemin & Bourigault, 2000) Most related efforts in concept mining are centralized in term recognition. The common used approaches are mainly based on linguistic rules (Chen et al., 2003), statistics (Zheng & Lu, 2005; Agirre et al., 2004) or a combination of both (Du et al., 2005; Velardi et al., 2001). In our research, we realize that concepts are not just terms. Terms are domain-specific while concepts are general-purpose. Furthermore, terms are just restricted to several kinds of concepts such as named entities. So even we can benefit a lot from term recognition we cannot use it to learn concepts directly. Other relevant works in concept mining are focused on concepts extraction from documents. Gelfand has developed a method based on the Semantic Relation Graph to extract concepts from a whole document (Gelfand et al., 1998). Nakata has described a method to index important concepts described in a collection of documents belonging to a group for sharing them (Nakata et al., 1998). A major difference between their works and ours is that we want to learn huge amount of concepts from a large-scale raw corpus efficiently rather than from one or several documents. So the analysis of documents will lead to a very higher time complexity and does not work for our purpose. There are many types relationships between lexical concepts such as antonymy, meronomy and hyponymy, among which the study of hyponymy relationship has attracted many effort of research because of its wide use. There are three mainstream approaches—the Symbolic approach, the Statistical approach and the Hierarchical approach—to discovery general 19
منابع مشابه
A review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کاملارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متنکاوی در حوزه یادگیری الکترونیکی
As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملTowards Multilingual Information Discovery through a SOM based Text Mining approach
Text mining has been gaining popularity in the knowledge discovery field, particularity with the increasing availability of digital documents in various languages from all around the world. However, currently most text mining tools mainly focus on processing monolingual documents (particularly English documents) only, little attention has been paid to apply the techniques to handle the document...
متن کاملA Survey Paper on Concept Mining in Text Documents
1. Berry Michael W., (2004), “Automatic Discovery of Similar Words”, in “Survey of Text Mining: Clustering, Classification and Retrieval”, Springer Verlag, New York, LLC, 24-43 2. Navathe, Shamkant B., and Elmasri Ramez, (2000), “Data Warehousing and Data Mining”, in “Fundamentals of Database Systems”, Pearson Education pvtInc, singapore, 841-872. 3. HaralamposKaranikas and BabisTheodoulidis Ma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012